A focused crawler based on semantic disambiguation vector space model

نویسندگان

چکیده

Abstract The focused crawler grabs continuously web pages related to the given topic according priorities of unvisited hyperlinks. In many previous studies, crawlers predict hyperlinks based on text similarity models. However, representation terms page ignore phenomenon polysemy, and cannot combine cosine semantic effectively. To address these problems, this paper proposes a disambiguation vector space model (SDVSM). SDVSM method combines graph (SDG) (SVSM). SDG is used remove ambiguation irrelevant from retrieved pages. SVSM calculate by constructing vectors TF × IDF weights similarities between terms. experiment results indicate that can improve performance comparing different evaluation indicators for four crawlers. conclusion, proposed make grab higher quality more quantity Internet.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Personal Name Disambiguation Based on Vector Space Model

This paper introduces the task of Chinese personal name disambiguation of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP) 2012 that Natural Language Processing Laboratory of Zhengzhou University took part in. In this task, we mainly use the Vector Space Model to disambiguate Chinese personal name. We extract different named entity features from diverse names informa...

متن کامل

A Focused Crawler Based on Correlation Analysis

With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it’s a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TFIDF text correlation analysis. We...

متن کامل

An Ontology-Based Focused Crawler

In this paper we present a novel approach for building a focused crawler. The goal of our crawler is to effectively identify web pages that relate to a set of predefined topics and download them regardless of their web topology or connectivity with other popular pages on the web. The main challenges that we address in our study concern the following. First we need to be able to effectively iden...

متن کامل

mortality forecasting based on lee-carter model

over the past decades a number of approaches have been applied for forecasting mortality. in 1992, a new method for long-run forecast of the level and age pattern of mortality was published by lee and carter. this method was welcomed by many authors so it was extended through a wider class of generalized, parametric and nonlinear model. this model represents one of the most influential recent d...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Complex & Intelligent Systems

سال: 2022

ISSN: ['2198-6053', '2199-4536']

DOI: https://doi.org/10.1007/s40747-022-00707-8